85 research outputs found
Publishing Microdata with a Robust Privacy Guarantee
Today, the publication of microdata poses a privacy threat. Vast research has
striven to define the privacy condition that microdata should satisfy before it
is released, and devise algorithms to anonymize the data so as to achieve this
condition. Yet, no method proposed to date explicitly bounds the percentage of
information an adversary gains after seeing the published data for each
sensitive value therein. This paper introduces beta-likeness, an appropriately
robust privacy model for microdata anonymization, along with two anonymization
schemes designed therefor, the one based on generalization, and the other based
on perturbation. Our model postulates that an adversary's confidence on the
likelihood of a certain sensitive-attribute (SA) value should not increase, in
relative difference terms, by more than a predefined threshold. Our techniques
aim to satisfy a given beta threshold with little information loss. We
experimentally demonstrate that (i) our model provides an effective privacy
guarantee in a way that predecessor models cannot, (ii) our generalization
scheme is more effective and efficient in its task than methods adapting
algorithms for the k-anonymity model, and (iii) our perturbation method
outperforms a baseline approach. Moreover, we discuss in detail the resistance
of our model and methods to attacks proposed in previous research.Comment: VLDB201
NetLSD: Hearing the Shape of a Graph
Comparison among graphs is ubiquitous in graph analytics. However, it is a
hard task in terms of the expressiveness of the employed similarity measure and
the efficiency of its computation. Ideally, graph comparison should be
invariant to the order of nodes and the sizes of compared graphs, adaptive to
the scale of graph patterns, and scalable. Unfortunately, these properties have
not been addressed together. Graph comparisons still rely on direct approaches,
graph kernels, or representation-based methods, which are all inefficient and
impractical for large graph collections.
In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD):
the first, to our knowledge, permutation- and size-invariant, scale-adaptive,
and efficiently computable graph representation method that allows for
straightforward comparisons of large graphs. NetLSD extracts a compact
signature that inherits the formal properties of the Laplacian spectrum,
specifically its heat or wave kernel; thus, it hears the shape of a graph. Our
evaluation on a variety of real-world graphs demonstrates that it outperforms
previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, August 19--23, 2018, London, United Kingdo
Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores
Modern business applications and scientific databases call for inherently
dynamic data storage environments. Such environments are characterized by two
challenging features: (a) they have little idle system time to devote on
physical design; and (b) there is little, if any, a priori workload knowledge,
while the query and data workload keeps changing dynamically. In such
environments, traditional approaches to index building and maintenance cannot
apply. Database cracking has been proposed as a solution that allows on-the-fly
physical data reorganization, as a collateral effect of query processing.
Cracking aims to continuously and automatically adapt indexes to the workload
at hand, without human intervention. Indexes are built incrementally,
adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing
methods fail to deliver workload-robustness; they perform much better with
random workloads than with others. This frailty derives from the inelasticity
with which these approaches interpret each query as a hint on how data should
be stored. Current cracking schemes blindly reorganize the data within each
query's range, even if that results into successive expensive operations with
minimal indexing benefit. In this paper, we introduce stochastic cracking, a
significantly more resilient approach to adaptive indexing. Stochastic cracking
also uses each query as a hint on how to reorganize data, but not blindly so;
it gains resilience and avoids performance bottlenecks by deliberately applying
certain arbitrary choices in its decision-making. Thereby, we bring adaptive
indexing forward to a mature formulation that confers the workload-robustness
previous approaches lacked. Our extensive experimental study verifies that
stochastic cracking maintains the desired properties of original database
cracking while at the same time it performs well with diverse realistic
workloads.Comment: VLDB201
Mechanistic characterization of p62 as a driver of melanoma metastasis
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Medicina, Departamento de Bioquímica. Fecha de lectura: 07-07-2017Esta tesis tiene embargado el acceso al texto completo hasta el 07-01-201
VERSE: Versatile Graph Embeddings from Similarity Measures
Embedding a web-scale information network into a low-dimensional vector space
facilitates tasks such as link prediction, classification, and visualization.
Past research has addressed the problem of extracting such embeddings by
adopting methods from words to graphs, without defining a clearly
comprehensible graph-related objective. Yet, as we show, the objectives used in
past works implicitly utilize similarity measures among graph nodes.
In this paper, we carry the similarity orientation of previous works to its
logical conclusion; we propose VERtex Similarity Embeddings (VERSE), a simple,
versatile, and memory-efficient method that derives graph embeddings explicitly
calibrated to preserve the distributions of a selected vertex-to-vertex
similarity measure. VERSE learns such embeddings by training a single-layer
neural network. While its default, scalable version does so via sampling
similarity information, we also develop a variant using the full information
per vertex. Our experimental study on standard benchmarks and real-world
datasets demonstrates that VERSE, instantiated with diverse similarity
measures, outperforms state-of-the-art methods in terms of precision and recall
in major data mining tasks and supersedes them in time and space efficiency,
while the scalable sampling-based variant achieves equally good results as the
non-scalable full variant.Comment: In WWW 2018: The Web Conference. 10 pages, 5 figure
Spectral Graph Complexity
We introduce a spectral notion of graph complexity derived from the Weyl's
law. We experimentally demonstrate its correlation to how well the graph can be
embedded in a low-dimensional Euclidean space.Comment: BigNet workshop at the Web conferece'201
- …